247 research outputs found

    Learning on a Budget Using Distributional RL

    Get PDF
    Agents acting in real-world scenarios often have constraints such as finite budgets or daily job performance targets. While repeated (episodic) tasks can be solved with existing RL algorithms, methods need to be extended if the repetition depends on performance. Recent work has introduced a distributional perspective on reinforcement learning, providing a model of episodic returns. Inspired by these results we contribute the new budget- and risk-aware distributional reinforcement learning (BRAD-RL) algorithm that bootstraps from the C51 distributional output and then uses value iteration to estimate the value of starting an episode with a certain amount of budget. With this strategy we can make budget-wise action selection within each episode and maximize the return across episodes. Experiments in a grid-world domain highlight the benefits of our algorithm, maximizing discounted future returns when low cumulative performance may terminate repetition

    An exploration strategy for non-stationary opponents

    Get PDF
    The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent’s objective is to learn a model of the opponent’s strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm called R-max# for learning and planning against non-stationary opponent. To handle such opponents, R-max# reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent’s switch and learn a new model in terms of finite sample complexity. R-max# makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains

    Efficiently detecting switches against non-stationary opponents

    Get PDF
    Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner’s dilemma and then in a more realistic scenario, the Power TAC simulator

    A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

    Get PDF
    The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research

    Oceanographic processes and products around the Iberian margin: a new multidisciplinary approach

    Get PDF
    Our understanding of the role of bottom currents and associated oceanographic processes (e.g, overflows, barotropic tidal currents) including intermittent processes (e.g, vertical eddies, deep sea storms, horizontal vortices, internal waves and tsunamis) is rapidly evolving. Many deep-water processes remain poorly understood due to limited direct observations, but may generate significant depositional and erosional features on both short-and long-term time scales. This paper describes these oceanographic processes and examines their potential role in the sedimentary features around the Iberian margin. The paper explores the implications of the processes studied, given their secondary role relative to other factors such as mass-transport and turbiditic processes. An integrated interpretation of these oceanographic processes requires an understanding of contourites, sea-floor features, their spatial and temporal evolution, and the near-bottom flows that form them. Given their complex, three-dimensional and temporally-variable nature, integration of these processes into sedimentary, oceanographic and climatological frameworks will require a multidisciplinary approach that includes Geology, Physical Oceanography, Paleoceanography and Benthic Biology. This approach will synthesize oceanographic data, seafloor morphology, sediments and seismic images to improve our knowledge of permanent and intermittent processes around Iberia, and evaluate their conceptual and regional role in the sedimentary evolution of the margin. © 2015, Instituto Geologico y Minero de Espana. All rights reservedEl conocimiento del papel de las corrientes de fondo y los procesos oceanográficos asociados (overflows, corrientes de marea barotrópicas, etc), incluyendo procesos intermitentes (eddies, tormentas profundas, ondas internas, tsunamis, etc), está evolucionando rápidamente. Muchos de estos procesos son poco conocidos, en parte debido a que las observaciones directas son limitadas, si bien pueden generar importantes rasgos deposicionales y/o erosivos a escalas temporales de corto o largo periodo. Este artículo describe dichos procesos oceanográficos y examina su influencia en la presencia de rasgos sedimentarios alrededor del margen Ibérico. El trabajo discute las implicaciones de dichos procesos y el papel secundario que juegan en relación a otros factores tales como los procesos de transporte gravitacionales en masa y los turbidíticos. Para un mejor conocimiento de la sedimentación marina profunda, y en concreto de los sistemas contorníticos, se requiere de una interpretación de estos procesos oceanográficos, cuál es su evolución espacial y temporal, cómo afectan a las corrientes de fondo y cómo se ven afectados por la topografía submarina. Sin embargo, dada su complejidad y su variable naturaleza tridimensional y temporal, es necesario que estos procesos se integren en un marco sedimentológico, oceanográfico y climatológico con un enfoque multidisciplinar que incluyan la Geología, la Oceanografía Física, la Paleoceanografía y la Biología bentónica. Esta integración requiere de una mayor compilación de datos oceanográficos, de un mejor conocimiento de la morfología del fondo marino, y de una mejor caracterización de los sedimentos en ambientes profundos. Todo ello permitirá mejorar nuestro conocimiento de los procesos permanentes e intermitentes alrededor de Iberia y evaluar su verdadero efecto en la evolución sedimentaria delos márgenes continentales que le rodeanPostprint0,000

    Mutations in lectin complement pathway genes COLEC11 and MASP1 cause 3MC syndrome

    Get PDF
    Published version available at https://www.nature.com/articles/ng.757#Ack1This work was supported in part by grants from NEWLIFE (P.L.B., A.D.-F. and C.R.), the Wellcome Trust (P.L.B.), Dubai Harvard Foundation for Medical Research (F.S.A.), the University Hospital of Bordeaux (C.R.), the UK Medical Research Council (A.W.) and EU-FP7 (201804-EUCILIA) (V.H.-H., D.J. and D.P.S.O.). This research was supported by the National Institute for Health Research Biomedical Research Centre at Great Ormond Street Hospital for Children NHS Foundation Trust and University College London (P.L.B.). P.L.B. is a Wellcome Trust Senior Research Fellow

    Quantitative image analysis of polyhydroxyalkanoates inclusions from microbial mixed cultures under different SBR operation strategies

    Get PDF
    Polyhydroxyalkanoates (PHAs) produced from mixed microbial cultures (MMC), regarded as potential substitutes of petrochemical plastics, can be found as intracellular granules in various microorganisms under limited nutrient conditions and excess of carbon source. PHA is traditionally quantified by laborious and time-consuming chromatography analysis, and a simpler and faster method to assess PHA contents from MMC, such as quantitative image analysis (QIA), is of great interest. The main purpose of the present work was to upgrade a previously developed QIA methodology (Mesquita et al., 2013a, 2015) for MMC intracellular PHA contents quantification, increase the studied intracellular PHA concentration range and extend to different sequencing batch reactor (SBR) operation strategies. Therefore, the operation of a new aerobic dynamic feeding (ADF) SBR allowed further extending the studied operating conditions, dataset, and range of the MMC intracellular PHA contents from the previously reported anaerobic/aerobic cycle SBR. Nile Blue A (NBA) staining was employed for epifluorescence microscope visualization and image acquisition, further fed to a custom developed QIA. Data from each of the feast and famine cycles of both SBR were individually processed using chemometrics analysis, obtaining the correspondent partial least squares (PLS) models. The PHA concentrations determined from PLS models were further plotted against the results obtained in the standard chromatographic method. For both SBR the predicted ability was higher at the end of the feast stage than for the famine stage. Indeed, an independent feast and famine QIA data treatment was found to be fundamental to obtain the best prediction abilities. Furthermore, a promising overall correlation (R2 of 0.83) could be found combining the overall QIA data regarding the PHA prediction up to a concentration of 1785.1 mgL-1 (37.3 wt%). Thus, the results confirm that the presented QIA methodology can be seen as promising for estimating higher intracellular PHA concentrations for a larger reactors operation systems and further extending the prediction range of previous studies.This study was supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684) and BioTecNorte operation (NORTE01-0145-FEDER-000004) funded by European Regional Development Fundunder the scope ofNorte2020 - ProgramaOperacional Regional do Norte.The authors also acknowledge the financial support to Cristiano S. Leal (PTDC/EBB-EBI/103147/2008, FCOMP-01-0124-FEDER009704) and Daniela P. Mesquita through the FCT postdoctoral grant (SFRH/BPD/82558/2011).info:eu-repo/semantics/publishedVersio
    corecore